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Abstract —We study the complementary behaviors of external 
and internal examples in image restoration, and are motivated 
to formulate a composite dictionary design framework. The 
composite dictionary consists of the global part learned from 
external examples, and the sample-specific part learned from 
internal examples. The dictionary atoms in both parts are 
further adaptively weighted to emphasize their model statistics. 
Experiments demonstrate that the joint utilization of external 
and internal examples leads to substantial improvements, with 
successful applications in image denoising and super resolution. 


I. Introduction 

Sparse coding (SC) by representing a signal as a sparse 
linear combination of representation bases (dictionary atoms) 
has been widely applied JTl. The dictionary, which should 
both faithfully represent the signal and effectively extract 
task-specific features, plays an important role ||2l. For image 
restoration, classical methods either rely on a large external 
set of image examples a, or find self-similar examples 
from the input 0. With much progress being made, it is 
recently recognized that external and internal examples each 
suffer from certain drawbacks, but their properties may be 
complementary Q, 0. 

We believe the joint utilization of external and internal 
examples in dictionary design is crucial for further improving 
image restoration. We thus formulate a new composite dictio¬ 
nary design framework for image restoration tasks. Successful 
applications in image denoising and super resolution (SR) 
demonstrate its effectiveness. 

II. Related Work 

The problem to be investigated resembles to a general prob¬ 
lem in SC-based classification: how to adaptively build the re¬ 
lationship between dictionary atoms and class labels? Based on 
predefined relationships, current supervised dictionary learning 
(DL) methods are categorized into either learning a shared 
dictionary by all classes, which may be compact but not 
sufficiently discriminative Q; or a class-specific dictionary 
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with the opposite properties 18]. In 13, the authors jointly 
learned a composite dictionary combing class-specific and 
shared dictionary atoms, with a latent matrix indicating the 
relationship between dictionary atoms and labels. 

In analogy to the classification case, reconstruction-purpose 
dictionaries have been built from either external or internal ex¬ 
amples. External exampled-based methods are known for their 
capabilities to produce plausible image appearances. However, 
there is no guarantee that an arbitrary input patch can be well 
matched or represented by a pre-fixed external set. When there 
is rarely any match for the input, external examples are prone 
to introduce either noise or oversmoothness Qoj. Meanwhile, 
the self similarity property supplies internal examples that are 
highly relevant to the input, but only of a limited number. 
Due to the insufficiency of internal examples, their mismatches 
often result in severe visual artifacts jm. 

The joint utilization of both external and self examples has 
been first studied for image denoising |[T^ . Mosseri et. al. 
Ga proposed that image patches had different preferences 
towards either external or self examples for denoising. Such 
a preference is in essence the tradeoff between noise-fitting 
versus signal-fitting. In Q, IS, a joint super-resolution (SR) 
models was proposed to adaptively combine the advantages 
of both external and self example-based loss functions. m 
further investigated the utilization of self-similarity into deep 
learning-based SR. However, none of the prior work makes 
much progress towards a unified dictionary design framework. 

III. Technical Approach 

A. Overview 

The composite dictionary consists of the global dictionary 
part learned from external examples, and the sample-specific 
dictionary part learned from internal examples. The atoms in 
both parts are further weighted to exploit the different model 
statistics. Given input signal x G RP^^, the formulation can 
be mathematically presented as: 

min Ab EJ|a°,I |i -f A/ ^ ■ ||a®^ 11 1 -f ||x- 

dpMG(dp,x,a;G)ap - df AT/(d|',x,a;s)a?|||, 

( 1 ) 

Here d*^^ G = 1,2,...,M denotes the dictio¬ 

nary atoms pre-learned from external examples, and d®j G 


= 1,2, ...,N the atoms pre-learned from internal ex¬ 
amples. We define d^^M} as the global base 

dictionary and D® = {d®i,d® 7 v} as the sample-specific 
base dictionary, a € j^(m-\-n)xi denotes sparse codes of x, 
consisting of = 1,2,...,M and a^j,j = 1,2,...,N, 

corresponding to and D®, respectively. Xe and A/ are 
constants. Note that both the first two terms, and the last two 
summations in the third term of ([T]l, can be each combined 
together just like in conventional SC. We prefer writing them 
separately in a purpose to highlight two different dictionary 
parts. Me and Mj denote some similarity-based weights 
between the dictionary atoms and the input, parametrized by 
UJG and uJs, respectively. 

Our solution to O takes three steps: 

• Obtain and D® prior to solving ([TJ. 

• Choose the desired forms of Mq and Mj. 

• Solve ([T]i using a coordinate-descent algorithm. 

Note the strategy is to first fix base dictionaries, then adapting 
them to the input by learning weights, which is close to M- 
Both lead to efficient and flexible dictionary representations. 
Yet rather than simply enforcing sparsity constraints on the 
weights Qa, we aim to build a more adaptive relationship 
between the input and the atoms, based on the complementary 
example statistics. Additionally, while the base in M is 
simply a DCT dictionary, our and D® are specifically 
crafted from external and internal examples separately. 

Algorithm 1 Coordinate descent algorithm for solving ([T]i 

Require: and D®; Xe and A/; ITER; f). 

1: FOR t=l to ITER DO 

2: Fix Hg and Us, solve ([T]) over A using the feature-sign 
algorithm El. 

3: Fix rJe and A, solve Os by taking gradient descent over 
Fs, with step size fi. 

4: Fix Os and A, solve Og by taking gradient descent over 
Fg, with step size fi. 

5: END FOR 
Ensure: A, Og and Os 


B. Algorithm 

First of all, we learn and D® from the sets of external 
and internal examples respectively (e.g., by K-SVD ID). 
Before moving on to learn functional forms of Mq and Mj, 
it is interesting to examine whether a static, but well-defined 
weight could help. Without loss of generality, we assume both 
Mg and M/ have a value range between [0,1]. It is obvious 
that, when Mg (or Mj) becomes larger, i.e., the current atom 
is highly correlated to the input, the atom will be more favored 
by the penalty. We first define Mg and M/ both in the form 
of radial basis function (RBF) kernels; 

^^G(dp,x,a;G) = exp(-WG||dp -x|p) 
Mi{df,x,ujs) = exp(-a;s||d? -x|p) 

where ujg and uis are both fixed constants, but of different 
values. As discussed above, it is often more likely to And 


“highly similar” examples internally than from external exam¬ 
ples. On the other hand, the external set can usually provide 
more abundant “reasonably similar” (not necessarily highly 
though) examples. We thus desire the value of Mj to decrease 
more quickly than Mg, which means ujs is supposed to be 
chosen larger than ujg- Note that when ojg and ujs become 
fixed, ([1]) becomes a plain sparse decomposition problem, that 
can be solved efficiently un¬ 
inspired by (|2]l, it is straightforward to construct parameter¬ 
ized Mg and Mj in the form of Mahalanobis kernel HD: 

MG(dp,x, ne) = exp(-(dP - x)^f2G(dp - x)) 
MKd?,x,f^s) = exp(-(df - x)^f2s(df - x)) 

Note JIg and Jls are both semi-definite real matrices. How¬ 
ever, learning fiG or fJs directly requires enforcing a positive 
semi-definite constraint during optimization, which is expen¬ 
sive. A cheaper and well-known solution is to decompose: 
JIg = Fg^Fg, = Fs^Fs. Since Fg is an unconstrained 
real matrix, we can now cast the metric learning as an 
unconstrained matrix optimization problem. 

Concluding all above, we solve O by a coordinate descent 
algorithm, as detailed in Algorithm 1. 

IV. Experiments 

In our experiments, x is by default a 5 x 5 image patch, 
columnized to be a 25 x 1 vector; the externakinternal exam¬ 
ples and resulting dictionary atoms share the same dimension. 
We use the natural patches cropped from the Berkley Segmen¬ 
tation Dataset (BSD) as the collection of external examples. 
The internal example candidates are cropped from the input 
image with a spatial overlap of 1, to ensure a reasonably 
sufficient amount. For image SR, the methodology is similar 
but works on example pairs. We set M — 128, and N = 32 
as the default dictionary sizes of and D®. 

In Algorithm 1, Fg and Fs are both initialized to be iden¬ 
tity matrices with some random disturbance on the diagonal 
elements. When handling (|2]l, we fine-tune ujg and ujs by 
cross-validation. The solved A from da serves as a proper 
initialization in ©. Xe and A/ vary by applications and will 
be tuned, but we And it universally applicable to set A/ around 
10 times of Xe- is fixed at 0.9, and ITER is 5 for all. 
When well initialized, the current MATFAB implementation 
takes no more than 5 iterations to converge, and each iteration 
consumes 10-15 minutes for a 256 x 256 image. 

A. KNN versus KSVD: The Power of Weights 

As the choice of base dictionaries can be quite flexible 
per application requirements, it is interesting to evaluate if 
we could rely on simple, computationally cheap bases for a 
comparable performance to the sophisticated ones. 

We construct the “KNN base dictionaries”; for we use 
a K-NN clustering over the external examples and M cluster 
centroids are obtained. For D®, we simply obtain N closest 
internal examples from the input image. We then compare the 
following methods in a typical image denoising setting; 

« Method I. Solving ([T]i using and D® 





• Method II. Performing conventional SC over the com¬ 
posite dictionary of and D® (equivalent to let 
Mq = Mj = I), as a benchmark. 

• Method III. The K-SVD algorithm IT] is first applied 
to either external or internal examples, to obtain the 
global and sample-specific K-SVD dictionaries, respec¬ 
tively. The two K-SVD dictionaries are concatenated into 
a composite dictionary, over which SC is performed. 

Five natural images, Lena, Barbara, Boats, House and Peppers 
are used for testing, with gaussian noise of standard deviation 
cr = 10. As in Table 1, it is impressive to see that Method 1, 
which relies on re-weighting the simplest KNN base dictio¬ 
naries in O, outperforms the canonical KSVD dictionaries. 
We also see a large average margin of 4dB of Method 1 over 
Method II in terms of PSNR, which clearly manifests the 
benefits of modeling and learning proper weights. 

TABLE I 

Comparison of PNSRs (dB) used three dieeerent methods. 



Lena 

Barbara 

Boats 

House 

Peppers 

Method I 

35.57 

33.98 

33.83 

33.56 

34.93 

Method II 

31.21 

30.41 

31.24 

29.43 

30.67 

Method III 

35.36 

34.24 

33.62 

34.76 

34.32 


B. Application I: Image Denoising 

Image denoising is a most classical application scenario for 
SC and DL. Each image is processed in a patch-wise manner 
with a spatial overlap of 1. For the best performances, we use 
the global and sample-specific K-SVD dictionaries, obtained 
in the above Method III, as our base dictionaries and D®. 
We compare the following methods on the five natural images; 

• KSVD G denotes SC directly performed over the global 
K-SVD dictionary 

• KSVD S denotes SC directly performed over the sample 
-specific K-SVD dictionary D®. 

• KSVD C denotes SC directly performed over the com¬ 
posite dictionary of and D®. 

• SC FW denotes “SC with fixed weights” by solving ([T]i 
over the composite dictionary of and D®, with Mq 
and Mj defined in IT]). 

• SC LW denotes “SC with learned weights” by solving 
([T]) over the composite dictionary of and D®, with 
Me and M/ defined in Q. 

cr varies from 10 to 50, with a stride of 10. For each method, 
the average PNSRs over all five images under various a values 
are reported in Table II. The proposed SC FW outperforms all 
else with a large margin of around 2dB, in all cases. 

The learned weights help SC FW outperform SC FW 
remarkably. When a goes larger, the performances of SC FW 
degrade quickly. It can be interpreted that the (static) RBF 
weights become less reliable in describing the correlations 
between the noisy input patch and dictionary atoms, especially 
those from which are cropped from noise-free images. In 
contrast, the learned weights show better robustness. Also, it is 


TABLE II 

Comparison of PNSRs (dB) among different denoising methods. 



cr = 10 

fT = 20 

II 

GJ 

O 

II 

o 

II 

o 

KSVD G 

33.57 

30.18 

28.83 

26.43 

25.32 

KSVD S 

34.23 

31.02 

28.94 

26.66 

25.48 

KSVD C 

34.46 

32.24 

29.62 

26.76 

25.67 

SC EW 

34.83 

33.45 

30.28 

26.27 

25.32 

SC LW 

36.27 

34.24 

32.83 

28.76 

26.32 


not a surprise to see that the utilization of joint examples leads 
to the consistent superiority of KSVD C over either KSVD G 
or KSVD S. 

Interestingly, by comparing KSVD S and KSVD G, we 
observe that internal examples gain advantages over external 
ones under small as. For large cts, the results of KSVD S 
deteriorate faster and become worse than KSVD G when a 
= 50. The performance margin of KSVD C over KSVD G is 
also reduced when a increases. Such observations imply that 
when the noise becomes heavy, internal examples are overly 
corrupted and cannot provide relevant references well. That 
coincide with the conclusion in m, and further inspires us to 
investigate the ratio of size to D® size, denoted as r. 
External versus Internal Table III is one more set of con¬ 
vincing results to demonstrate the complementary behaviors of 
joint examples. Provided the total amount of dictionary atoms 
is fixed at 160, Table III lists how the average PSNR of SC 
FW changes with r, where has 160r/(lH-r) atoms and D® 
has 160/(lH-r) atoms (previously r = 4). As shown by Table III, 
increasing r from 4 to 7 leads to improved PNSRs in heavy 
noise cases (cr = 40 and 50). However, neither an overly large 
nor a small r leads to any performance gain. On the one hand, 
the PSNR decreases rapidly with r when r < 4, under all crs. 
It proves the key role of external examples in reconstructing 
high-quality patches under mild noise conditions. On the other 
hand, the PSNR also becomes poor when r = 15. The study 
of r offers another powerful support for the importance of 
learning composite dictionaries from joint examples. 

TABLE III 

Comparison of PNSRs (dB) of SC LW under different r and a 

VALUES. 



cr = 10 

cr = 20 

cr = 30 

o 

II 

b 

cr = 50 

r=0 

30.47 

28.48 

27.67 

26.20 

24.06 

r=l 

32.23 

30.27 

29.95 

26.82 

25.03 

r=3 

35.46 

33.27 

31.80 

28.97 

26.21 

r=4 

36.27 

34.24 

32.83 

28.76 

26.32 

r=7 

36.18 

34.05 

32.58 

28.94 

26.37 

r=9 

36.07 

33.84 

31.73 

28.31 

26.02 

r=15 

34.57 

31.28 

30.80 

27.67 

25.83 


C. Application II: Image SR 

The proposed method can be applied to solving image SR 
problems by a variant extension. First of all, the example pools 
are no longer collection of image patches, but instead example 
pairs of a high-resolution (HR) patch and its low-resolution 
(FR) counterpart for each. Coupled dictionary learning has 









































been proposed in ifTSl to learn a dictionary pair from a large 
external set of example pairs. To be formulated mathemat¬ 
ically, the HR and LR patch spaces {X^} and {Yy} are 
assumed to be tied by some mapping function. With a well- 
trained coupled dictionary pair (Dh, Di), it assumes that (Xy, 
Yy) tends to admit a common sparse representation ay . Yang 
et. al. IfTSl suggested to first infer the sparse code of Yy 
with respect to Di, and then use it as an approximation of 
a^ (the sparse code of Xy with respect to Dh), in order to 
recover Xy Dha^-. 

We construct the pool of external example pairs in the 
same way as ESI. The pool of internal example pairs are less 
straightforward to construct, since the ’’groundtruth” HR image 
of the LR input is not available. To overcome the difficulty, we 
come up with an idea inspired by El. Based on the observation 
that singular features like edges and corners in small patches 
tend to repeat almost identically across different image scales, 
Freedman and Fattal El applied the “high frequency transfer” 
method to search the high-frequency component for a target 
HR patch, by NN patch matching across scales. Defining a 
linear interpolation operator U and a downsampling operator 
V, for the input LR image Y, we first obtain its initial 
upsampled image X ^ =U{Y), and a smoothed input image 
Y' = X>(W(Y)). Given the smoothed patch X^^, the missing 
high-frequency band of each unknown patch X|^ is predicted 
by first solving a NN matching (|4]i: 

(TO,n) = argmin(„_„)gvvy W^'mn “ X'f |||., (4) 

where Wy is defined as a small local searching window on 
image Y'. With the co-located patch 'Ymn from Y, the high- 
frequency band 'Ymn — Y'^n is pasted onto X^^, i.e., X|^ = 
Xj^-|-Ym„—Y'mn. Following this way, for the (i, j)-th patch 
of LR input Y, we could treat X^ as its corresponding HR 
patch and make them an internal example pair. 

TABLE IV 

Comparison of PNSRs (dB) used different SR methods. 



Temple 

Train 

Leopard 

Bicubic 

25.29 

26.14 

24.14 

Yang et.al.II8l 

26.20 

26.58 

25.32 

Freedman and Fattal 

21.17 

22.54 

23.04 

Proposed 

26.86 

27.44 

25.62 


We then apply the coupled dictionary learning algorithm in 
the similar manner (with the same K, M and N) as using K- 
SVD above, obtaining the global dictionary pair (D^, Dp) 
and the sample-specific dictionary pair (D®, Dp). We solve 
E} over Dp and Dp, after which we export the sparse codes 
as well as the learned weight values, to reconstruct the hnal 
HR patches by Dp and Dp. Comparison experiments are 
conducted against bicubic interpolation, Yang et.al.’s external 
example-based SR method IITSll . and Freedman et. al.’s internal 
example-based method El, on three test images Temple, Train 
and Leopard, with a factor of 3. While our method is not 
specifically optimized for image SR, it obtains better SR 
results than the other two competitive methods El, El. 


V. Conclusion 

We propose a novel composite dictionary design framework. 
The composite dictionary consists of global and sample- 
specific parts, learned from external and internal examples, re¬ 
spectively. We formulate the similarity weights that adaptively 
correlate sparse codes with base dictionary atoms. Experiments 
demonstrate that the joint utilization of external and internal 
examples outperforms either stand-alone alternative. The ap¬ 
plications in image denoising and SR show great potential 
along this research line. 
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